DSZOOM – Low Latency Software– Based Shared Memory

نویسندگان

ZORAN RADOVIĆ

Erik Hagersten

Zoran Radović

چکیده

Software-implementations of shared memory are still far behind the performance of hardwarebased shared memory implementations and are not viable options for most fine-grain sharedmemory applications. The major source for their inefficiency comes from the cost of interruptbased asynchronous protocol processing, not from the actual network latency. As the raw hardware latency of inter-node communication decreases, the asynchronous overhead in the communication becomes more dominant. Elaborate schemes, involving dedicated hardware and/or dedicated protocol processors, have been suggested to cut the overhead. This paper describes how all the asynchronous overhead can be completely removed by instead running the entire coherence protocol in the requesting processor. This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall. The technique is applicable to both page-based and fine-grain software shared memory. Our proof-of-concept implementation—DSZOOM-EMU—is a fine-grained software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest implementation to date of around ten microseconds. The all-software protocol is implemented assuming only some basic low-level primitives in the cluster interconnect. Based on a remote atomic and simple remote put/get operations the requesting processor can assume the role of the directory agent, traditionally assumed by a remote protocol agent in the home node in other implementations. The implementation is thread-safe and allows all processors in a node to simultaneously perform remote operations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing Low Latency Distributed Software-Based Shared Memory

Software-implementations of shared memory are still far behind the performance of hardware-based shared memory implementations (HW-DSM) and are not viable options for most fine-grain shared memory applications. The major source for their inefficiency comes from the cost of interrupt-based asynchronous protocol processing, not from the actual network latency. As the raw hardware latency of inter...

متن کامل

Latency-hiding and Optimizations of the DSZOOM Instrumentation System

An efficient and robust instrumentation tool (or compiler support) is necessary for an efficient implementation of fine-grain software-based shared memory systems (SW-DSMs). The DSZOOM system, developed by the Uppsala Architecture Research Team (UART) at Uppsala University, is a sequentially consistent fine-grained SW-DSM originally developed using Executable Editing Library (EEL)—a binary modi...

متن کامل

Evaluation, Implementation and Performance of Write Permission Caching in the DSZOOM System

Fine-grained software-based distributed shared memory (SWDSM) systems typically maintain coherence with in-line checking code at load and store operations to shared memory. The instrumentation overhead of this added checking code can be severe. This paper (1) shows that most of the instrumentation overhead in the fine-grained DSZOOM SW-DSM system is store related, (2) introduces a new write per...

متن کامل

Exploiting Spatial Store Locality Through Permission Caching in Software DSMs

Fine-grained software-based distributed shared memory (SWDSM) systems typically maintain coherence with in-line checking code at load and store operations to shared memory. The instrumentation overhead of this added checking code can be severe. This paper (1) shows that most of the instrumentation overhead in the fine-grained SW-DSM system DSZOOM is store-related, (2) introduces a new write per...

متن کامل

Efficient Synchronization and Coherence for Nonuniform Communication Architectures

Nonuniformity is a common characteristic of contemporary computer systems, mainly because of physical distances in computer designs. In large multiprocessors, the access to shared memory is often nonuniform, and may vary as much as ten times for some nonuniform memory access (NUMA) architectures, depending on if the memory is close to the requesting processor or not. Much research has been devo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

DSZOOM – Low Latency Software– Based Shared Memory

نویسندگان

چکیده

منابع مشابه

Implementing Low Latency Distributed Software-Based Shared Memory

Latency-hiding and Optimizations of the DSZOOM Instrumentation System

Evaluation, Implementation and Performance of Write Permission Caching in the DSZOOM System

Exploiting Spatial Store Locality Through Permission Caching in Software DSMs

Efficient Synchronization and Coherence for Nonuniform Communication Architectures

عنوان ژورنال:

اشتراک گذاری